706,190 research outputs found
Developing a comprehensive framework for multimodal feature extraction
Feature extraction is a critical component of many applied data science
workflows. In recent years, rapid advances in artificial intelligence and
machine learning have led to an explosion of feature extraction tools and
services that allow data scientists to cheaply and effectively annotate their
data along a vast array of dimensions---ranging from detecting faces in images
to analyzing the sentiment expressed in coherent text. Unfortunately, the
proliferation of powerful feature extraction services has been mirrored by a
corresponding expansion in the number of distinct interfaces to feature
extraction services. In a world where nearly every new service has its own API,
documentation, and/or client library, data scientists who need to combine
diverse features obtained from multiple sources are often forced to write and
maintain ever more elaborate feature extraction pipelines. To address this
challenge, we introduce a new open-source framework for comprehensive
multimodal feature extraction. Pliers is an open-source Python package that
supports standardized annotation of diverse data types (video, images, audio,
and text), and is expressly with both ease-of-use and extensibility in mind.
Users can apply a wide range of pre-existing feature extraction tools to their
data in just a few lines of Python code, and can also easily add their own
custom extractors by writing modular classes. A graph-based API enables rapid
development of complex feature extraction pipelines that output results in a
single, standardized format. We describe the package's architecture, detail its
major advantages over previous feature extraction toolboxes, and use a sample
application to a large functional MRI dataset to illustrate how pliers can
significantly reduce the time and effort required to construct sophisticated
feature extraction workflows while increasing code clarity and maintainability
Nonparametric Feature Extraction from Dendrograms
We propose feature extraction from dendrograms in a nonparametric way. The
Minimax distance measures correspond to building a dendrogram with single
linkage criterion, with defining specific forms of a level function and a
distance function over that. Therefore, we extend this method to arbitrary
dendrograms. We develop a generalized framework wherein different distance
measures can be inferred from different types of dendrograms, level functions
and distance functions. Via an appropriate embedding, we compute a vector-based
representation of the inferred distances, in order to enable many numerical
machine learning algorithms to employ such distances. Then, to address the
model selection problem, we study the aggregation of different dendrogram-based
distances respectively in solution space and in representation space in the
spirit of deep representations. In the first approach, for example for the
clustering problem, we build a graph with positive and negative edge weights
according to the consistency of the clustering labels of different objects
among different solutions, in the context of ensemble methods. Then, we use an
efficient variant of correlation clustering to produce the final clusters. In
the second approach, we investigate the sequential combination of different
distances and features sequentially in the spirit of multi-layered
architectures to obtain the final features. Finally, we demonstrate the
effectiveness of our approach via several numerical studies
Feature Extraction and Classification of Automatically Segmented Lung Lesion Using Improved Toboggan Algorithm
The accurate detection of lung lesions from computed tomography (CT) scans is essential for clinical diagnosis. It provides valuable information for treatment of lung cancer. However, the process is exigent to achieve a fully automatic lesion detection. Here, a novel segmentation algorithm is proposed, it's an improved toboggan algorithm with a three-step framework, which includes automatic seed point selection, multi-constraints lesion extraction and the lesion refinement. Then, the features like local binary pattern (LBP), wavelet, contourlet, grey level co-occurence matrix (GLCM) are applied to each region of interest of the segmented lung lesion image to extract the texture features such as contrast, homogeneity, energy, entropy and statistical extraction like mean, variance, standard deviation, convolution of modulated and normal frequencies. Finally, support vector machine (SVM) and K-nearest neighbour (KNN) classifiers are applied to classify the abnormal region based on the performance of the extracted features and their performance is been compared. The accuracy of 97.8% is been obtained by using SVM classifier when compared to KNN classifier. This approach does not require any human interaction for lesion detection. Thus, the improved toboggan algorithm can achieve precise lung lesion segmentation in CT images. The features extracted also helps to classify the lesion region of lungs efficiently
Randomized Dimensionality Reduction for k-means Clustering
We study the topic of dimensionality reduction for -means clustering.
Dimensionality reduction encompasses the union of two approaches: \emph{feature
selection} and \emph{feature extraction}. A feature selection based algorithm
for -means clustering selects a small subset of the input features and then
applies -means clustering on the selected features. A feature extraction
based algorithm for -means clustering constructs a small set of new
artificial features and then applies -means clustering on the constructed
features. Despite the significance of -means clustering as well as the
wealth of heuristic methods addressing it, provably accurate feature selection
methods for -means clustering are not known. On the other hand, two provably
accurate feature extraction methods for -means clustering are known in the
literature; one is based on random projections and the other is based on the
singular value decomposition (SVD).
This paper makes further progress towards a better understanding of
dimensionality reduction for -means clustering. Namely, we present the first
provably accurate feature selection method for -means clustering and, in
addition, we present two feature extraction methods. The first feature
extraction method is based on random projections and it improves upon the
existing results in terms of time complexity and number of features needed to
be extracted. The second feature extraction method is based on fast approximate
SVD factorizations and it also improves upon the existing results in terms of
time complexity. The proposed algorithms are randomized and provide
constant-factor approximation guarantees with respect to the optimal -means
objective value.Comment: IEEE Transactions on Information Theory, to appea
Face Detection with Effective Feature Extraction
There is an abundant literature on face detection due to its important role
in many vision applications. Since Viola and Jones proposed the first real-time
AdaBoost based face detector, Haar-like features have been adopted as the
method of choice for frontal face detection. In this work, we show that simple
features other than Haar-like features can also be applied for training an
effective face detector. Since, single feature is not discriminative enough to
separate faces from difficult non-faces, we further improve the generalization
performance of our simple features by introducing feature co-occurrences. We
demonstrate that our proposed features yield a performance improvement compared
to Haar-like features. In addition, our findings indicate that features play a
crucial role in the ability of the system to generalize.Comment: 7 pages. Conference version published in Asian Conf. Comp. Vision
201
- …